Project 1.1: A RISC-V Assembler (Individual Project)
IMPORTANT INFO - PLEASE READ
The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.
Introduction to Project 1.1
In Project 1.1, you are going to make a simple one-pass RISC-V assembler. The assembler takes RISC-V codes which contain no labels and symbols as input and outputs corresponding machine codes. You also need to implement basic error handling to detect invaild instructions. You can fetch the framework for Project 1.1 here on Github classroom, try to use git and Github for version control.
Background of The Instruction Set
Registers
Please consult the RISC-V Green Sheet (PDF) for register numbers, instruction opcodes, and bitwise formats. Our asembler will support all 32 registers: zero, ra, sp, gp, tp, t0-t6, s0 - s11, a0 - a7. Other register numbers (eg. x0, x1, x2 etc.) shall be also supported. Note that floating point registers are not included in this project.
Instructions
We will have 42 instructions and 6 pseudo-instructions to assemble. The instructions are:
Instruction | Type | Opcode | Funct3 | Funct7/IMM | Operation |
add rd, rs1, rs2 | R | 0x33 | 0x0 | 0x00 | R[rd] ← R[rs1] + R[rs2] |
mul rd, rs1, rs2 | 0x0 | 0x01 | R[rd] ← (R[rs1] * R[rs2])[31:0] | ||
sub rd, rs1, rs2 | 0x0 | 0x20 | R[rd] ← R[rs1] - R[rs2] | ||
sll rd, rs1, rs2 | 0x1 | 0x00 | R[rd] ← R[rs1] << R[rs2] | ||
mulh rd, rs1, rs2 | 0x1 | 0x01 | R[rd] ← (R[rs1] * R[rs2])[63:32] | ||
slt rd, rs1, rs2 | 0x2 | 0x00 | R[rd] ← (R[rs1] < R[rs2]) ? 1 : 0 | ||
sltu rd, rs1, rs2 | 0x3 | 0x00 | R[rd] ← (U(R[rs1]) < U(R[rs2])) ? 1 : 0 | ||
xor rd, rs1, rs2 | 0x4 | 0x00 | R[rd] ← R[rs1] ^ R[rs2] | ||
div rd, rs1, rs2 | 0x4 | 0x01 | R[rd] ← R[rs1] / R[rs2] | ||
srl rd, rs1, rs2 | 0x5 | 0x00 | R[rd] ← R[rs1] >> R[rs2] | ||
sra rd, rs1, rs2 | 0x5 | 0x20 | R[rd] ← R[rs1] >> R[rs2] | ||
or rd, rs1, rs2 | 0x6 | 0x00 | R[rd] ← R[rs1] | R[rs2] | ||
rem rd, rs1, rs2 | 0x6 | 0x01 | R[rd] ← (R[rs1] % R[rs2] | ||
and rd, rs1, rs2 | 0x7 | 0x00 | R[rd] ← R[rs1] & R[rs2] | ||
lb rd, offset(rs1) | I | 0x03 | 0x0 | R[rd] ← SignExt(Mem(R[rs1] + offset, byte)) | |
lh rd, offset(rs1) | 0x1 | R[rd] ← SignExt(Mem(R[rs1] + offset, half)) | |||
lw rd, offset(rs1) | 0x2 | R[rd] ← Mem(R[rs1] + offset, word) | |||
lbu rd, offset(rs1) | 0x4 | R[rd] ← U(Mem(R[rs1] + offset, byte)) | |||
lhu rd, offset(rs1) | 0x5 | R[rd] ← U(Mem(R[rs1] + offset, half)) | |||
addi rd, rs1, imm | 0x13 | 0x0 | R[rd] ← R[rs1] + imm | ||
slli rd, rs1, imm | 0x1 | 0x00 | R[rd] ← R[rs1] << imm | ||
slti rd, rs1, imm | 0x2 | R[rd] ← (R[rs1] < imm) ? 1 : 0 | |||
sltiu rd, rs1, imm | 0x3 | R[rd] ← (U(R[rs1]) < U(imm)) ? 1 : 0 | |||
xori rd, rs1, imm | 0x4 | R[rd] ← R[rs1] ^ imm | |||
srli rd, rs1, imm | 0x5 | 0x00 | R[rd] ← R[rs1] >> imm | ||
srai rd, rs1, imm | 0x5 | 0x20 | R[rd] ← R[rs1] >> imm | ||
ori rd, rs1, imm | 0x6 | R[rd] ← R[rs1] | imm | |||
andi rd, rs1, imm | 0x7 | R[rd] ← R[rs1] & imm | |||
jalr rd, rs1, imm | 0x67 | 0x0 |
R[rd] ← PC + 4
PC ← R[rs1] + imm |
||
ecall | 0x73 | 0x0 | 0x000 |
(Transfers control to operating system)
a0 = 1 is print value of a1 as an integer. a0 = 4 is print the string at address a1. a0 = 10 is exit or end of code indicator. a0 = 11 is print value of a1 as a character. |
|
sb rs2, offset(rs1) | S | 0x23 | 0x0 | Mem(R[rs1] + offset) ← R[rs2][7:0] | |
sh rs2, offset(rs1) | 0x1 | Mem(R[rs1] + offset) ← R[rs2][15:0] | |||
sw rs2, offset(rs1) | 0x2 | Mem(R[rs1] + offset) ← R[rs2] | |||
beq rs1, rs2, offset | SB | 0x63 | 0x0 |
if(R[rs1] == R[rs2])
PC ← PC + {offset, 1b'0} |
|
bne rs1, rs2, offset | 0x1 |
if(R[rs1] != R[rs2])
PC ← PC + {offset, 1b'0} |
|||
blt rs1, rs2, offset | 0x4 |
if(R[rs1] < R[rs2])
PC ← PC + {offset, 1b'0} |
|||
bge rs1, rs2, offset | 0x5 |
if(R[rs1] >= R[rs2])
PC ← PC + {offset, 1b'0} |
|||
bltu rs1, rs2, offset | 0x6 |
if(U(R[rs1]) < U(R[rs2]))
PC ← PC + {offset, 1b'0} |
|||
bgeu rs1, rs2, offset | 0x7 |
if(U(R[rs1]) >= U(R[rs2]))
PC ← PC + {offset, 1b'0} |
|||
auipc rd, offset | U | 0x17 | R[rd] ← PC + {offset, 12b'0} | ||
lui rd, offset | 0x37 | R[rd] ← {offset, 12b'0} | |||
jal rd, imm | UJ | 0x6f |
R[rd] ← PC + 4
PC ← PC + {imm, 1b'0} |
NOTE: Since our assembler is a one-pass assembler, the offset in SB and U type and imm in UJ type will be integers.
The pseudo-instructions are:
Pseudo-instruction | Format | Uses |
Branch on Equal to Zero | beqz rs1, label | beq |
Branch on not Equal to Zero | bnez rs1, label | bne |
Jump | j label | jal |
Jump Register | jr rs1 | jalr |
Load Immediate | li rd, immediate | lui, addi |
Move | mv rd, rs1 | addi |
For further reference, here are the bit lengths of the instruction components.
R-TYPE | funct7 | rs2 | rs1 | funct3 | rd | opcode |
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
I-TYPE | imm[11:0] | rs1 | funct3 | rd | opcode |
Bits | 12 | 5 | 3 | 5 | 7 |
S-TYPE | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
Bits | 7 | 5 | 5 | 3 | 5 | 7 |
SB-TYPE | imm[12] | imm[10:5] | rs2 | rs1 | funct3 | imm[4:1] | imm[11] | opcode |
Bits | 1 | 6 | 5 | 5 | 3 | 4 | 1 | 7 |
U-TYPE | imm[31:12] | rd | opcode |
Bits | 20 | 5 | 7 |
UJ-TYPE | imm[20] | imm[10:1] | imm[11] | imm[19:12] | rd | opcode |
Bits | 1 | 10 | 1 | 8 | 5 | 7 |
Getting Started
File Structure and Usage
The directory tree of the framework should like the following:
.
├── inc
│ ├── assembler.h
│ └── util.h
├── Makefile
├── main.c
├── src
│ ├── assembler.c
│ └── util.c
└── test
├── test.ref
└── test.S
main.c is the entry of the whole assembler. You should not modify this file.
assembler.c and assembler.h are where you implement the assembler function.
util.c and util.h contain some helper functions. You can also add useful functions there.
test directory contains a basic test and the correspoding result.
Build & Execute
- Run make to compile the code and assembler executable file will be main
- Or you can build the code with CMake. First make a directory build. Then run cmake .. && make under build. The executable file will be build/main
- To run the assembler, type main input_file output_file . input_file contains RISC-V instructions (see below for detailed description). output_file is where you output your results to.
- Run make test to test your codes with test/test.S and your output file will be test/test.out
Input & Output
Input
Input will be a file containing RISC-V instructins. You can assume there are no empty rows and comments and each line ends with a \n. We will use space as delimiter instead of comma, e.g. add x1 x2 x3.
Output
Output shoud be RISC-V machine codes. You should use function dump_code in src/util.c when outputing machine codes. This function will requrie a file handler and a uint32_t variable as parameters, which should be the output file and code to be dumped. Do not use your own output function, otherwise, there may be format problems. Also, do no change the output format in dump_code since we will use your util.c when grading.
Error Handling
If the input file contains some illegal instructions, you should find it and output error information to the output file. You should use function dump_error_information in src/util.c for outputing error information. Once an error occurs, you should continue to assemble the rest instructions and keep outputing results and errors. Also, you should not directly finish the whole program using exit. Quiting unexpectedly will be viewed as run time error.
To simplify the error handling part, we promise that there will only be one space between each string. Also, you do not need to handle cases where there are more or less parameters in an instruction, like addi a0 a1 or addi a0 a0 a0 1. Load/Store instructions will always be the correct format, e.g. lw a0 0(a1). But the correctness of registers and offset is not guaranteed.
Here are situations you need to consider in this project:
- Non-existent instruction: All supported instructions are listed above and any other instrcutions should be viewed as illegal.
- Bad registers: Wrong names of register or registers which are out of scope should be detected, e.g. rp, x32. You don't need to handle situations like x01 and a-1
- Bad immediate or offset: The imm or offset in instructions may not be a number, e.g. addi a0 a1 a0.
- Immediate out of range: The immediate in some instructions should be limited into some scope, since the number of bits to represent imm is limited. For example, imm in addi should be between -2048 and 2047. You can refer to Venus and the RISC-V manual for more information about the limitation.
Testing
Diff
Use diff file1 file2 to compare your output with the reference answer. Note that we will use diff to check your answer. To see how to interpret diff results, click here
Valgrind
To check memory leak, you can use Valgrind by running valgrind --tool=memcheck --leak-check=full --track-origin=yes main input_file output_file
Venus
Venus is a powerful assembler and you can use Venus to test the correctness of your code.
First type RISC-V instructions at the editor page. Then at the simulator page, you can see the machine code of each instruction. You can also use Dump button to collect all machine codes as a reference.
Tips
- Immediate in auipc and lui should be between 0 and 1048575. Venus views this immediate as an unsigned integer by defulat, while the official manual does not mention this. We choose to follow Venus. For auipc, since the starting address of text is smaller than that of data, PC-relative addresses are always larger than current PC, causing non-negative offset. For lui, it will load upper part of the immediate into the register, which does not care about the sign.
- Immediate in jal should be between -1048576 and 1048575. Venus does not limit this immediate for some reasons, even if immediates out of this range can not be represented. However, we are going to follow the hardware limitation.
- This project needs a lot of spliting operations. You may find strtok useful.
- You need to check whether the immediate in li instruction is between -2048 and 2047. If so, li should be translated into only one addi instruction. Otherwise, it will be translated into lui and addi
- Try to generate your own test cases. Codes need testing.
- Don't forget writing comments frequently.
Submission
You should submit your code via Github. Please follow the guidance in Gradescope to submit your codes on Github. Note that we will not use your main.c or Makefile for grading. The compilation flag will be -Wpedantic -Wall -Wextra -Wvla -Werror -std=c11.